Spatial distribution data of cultural sites from the Paleolithic to Bronze Age in Xinjiang, China

The published map recording cultural sites in Xinjiang shows that there is a lack of data collection on the distribution of sites in the area, and no relevant data sets have been released. Existing written materials indicate that there are more cultural sites in this area. For this reason, we have collected and sorted out information. Our cultural site database provides the geographic location and corresponding geographic environment of each site in Xinjiang from the Paleolithic to the Bronze Age. The data record the human development and settlement process, settlement environment landscape characteristics, scale, type, quantity, and spatial distribution in Xinjiang in prehistoric China. These data not only are the basis for further understanding the spatial distribution of prehistoric humans in Xinjiang, but also provide references for understanding prehistoric human behavior and prehistoric man-land relationship, and the exchange of eastern and western civilizations. It is of great significance to modern social planning, site protection, and resource utilization.


Methods
The site data used in this study is mainly based on the digitization of text data. The written materials come from the archaeological excavation results in Xinjiang for many years. They are supplemented by "The Atlas of Chinese Cultural Relics Xinjiang Volume" published by the State Administration of cultural relics 12 , "The Compilation of Cultural Relics and Archaeology in Xinjiang" published by the Xinjiang Institute of cultural relics and archaeology 18 , Huang Wenbi's Collection of Archaeological Research 19,20 , The Statistical Yearbook of Chinese Archaeology issued by the Chinese Archaeological Society and the results of China's third national cultural relics survey [21][22][23][24][25][26][27][28][29][30][31][32][33][34][35][36][37][38][39][40] . In this data, the sites with unknown ages are excluded; a total of 1655 sites are collected, including 12 sites of the Paleolithic, 52 sites of the Neolithic and 1591 sites of the Bronze age, as shown in Fig. 1. According to the position description in the written record, calibrate the site's position in the satellite image, record its longitude and latitude information, and then proofread it through archaeological excavation report and sampling field investigation. The data set mainly includes immovable cultural relics, including a small number of inscriptions and rock paintings relocated for protection.
Based on the spatial location calibration, the GIS data of the site location is obtained through ArcGIS vectorization conversion. Then take the 30 m digital elevation, soil type data, vegetation type data, and geomorphic type data of China as the geographical environment background data (Fig. 2), the elevation, slope, profile curvature, slope direction, soil type, geomorphic type and other information of the production site are extracted and produced through ArcGIS 10.2. First, based on DEM data, the study used Spatial Analyst's Slope and Aspect tools in ArcGIS to produce a slope, aspect, and profile curvature data, as shown in Fig. 3. Then use the Extract Values to Points tool to extract the raster information of all geographic background data to cultural sites Point data, output point data attributes and save it in CSV format to obtain a data set of site distribution and geographic environment information.
Digital elevation model (DEM) elevation data with a resolution of 30 m were obtained from the geospatial data cloud website (http://www.gcloud.cn), and 1:1 million-scale vegetation type spatial distribution data for China come from the Chinese Academy of Sciences Resource and Environmental Science Data Center (http:// www.resdc.cn/Default.aspx) 41 . Chinese soil attribute data and landform type data come from the National Qinghai-Tibet Plateau Science Data Center (http://data.tpdc.ac.cn) 42,43 .

Data records
The raw data of this research has been released in the National Qinghai-Tibet Plateau Data Center (http://data.tpdc. ac.cn). https://doi.org/10.11888/HumanNat.tpdc.271910 45 . The website is http://data.tpdc.ac.cn/en/disallow/bb49a-6da-bfd4-4355-9d0c-988eef793ee1/. The database is stored in the CSV format in The National Tibetan Plateau Data Center (TPDC) (http://data.tpdc.ac.cn/en/), and 16 pieces of information are collected for each site: (1) Chronological order; (2)) Longitude; (3) Latitude; (4) Site category; (5) Name; (6) Civilized age; (7) Cultural age(Equivalent to); (8) Area (m 2 ); (9) Soil type code; (10) Landform Type Code; (11) Vegetation type code; (12) Altitude(m); (13) Slope(°); (14) Aspect; (15) Time(a, BP); (16) Section curvature. The "Chronological order" is based on the time of existence of each site, from Paleolithic to the Bronze Age, as shown in Table 1. "Longitude" and "Latitude" are the longitude and latitude of the cultural site and are the longitude where the cultural site sits, recorded in decimal format. Because some of the sites have been backfilled and buried on the ground and are not visible, most of the sites are confirmed by satellite image 2021, so the latitude and longitude are approximate values. "Site category" is the type of cultural relics defined by the classification method in the "Law of the People's Republic of China on the Protection of Cultural Relics". This article mainly includes ancient sites, ancient tombs, stone carvings, cave temples, and others, divided into 4 main types. "Name" represents the name of the site, based on the archaeological naming at the time the site was discovered, generally the village or natural location + type, such as Dushanzi (place name) tomb (type). "Civilized age" indicates the civilization stage of the site, namely the Paleolithic, Neolithic and Bronze Age. "Cultural age (Equivalent to)" is the chronology of Chinese history in which the cultural site is located. Some ancient humans and their remains use www.nature.com/scientificdata www.nature.com/scientificdata/ Fig. 3 The extracted slope, aspect and profile curvature of the study area. www.nature.com/scientificdata www.nature.com/scientificdata/ geological age. Due to the vague classification of archaeological types between the Paleolithic and Neolithic cultural sites in Xinjiang 12 , and the existence of regional cultural types are not clear, this article has made a one-to-one correspondence with the archaeological, cultural types of China's Central Plains based on their chronological information 12,44,46 , the unified replacement is the cultural era of the Central Plains region. "Area (m 2 )" indicates the area of cultural sites. "Soil type code" represents the modern soil type code where the cultural site is located; see literature for the specific soil type. "Landform Type Code" represents the modern landform type code where the cultural site is located 42 ; see literature for the corresponding landform type 43 . "Vegetation type code" represents the modern vegetation type code where the cultural site is located. For specific vegetation types, see literature 41 . "Altitude(m)" represents the altitude where the cultural site is located, in m. "Slope (°)" means the slope of the plane where the cultural site is located, and the unit is °. "Aspect" refers to the aspect of the plane where the cultural site is located. This article is divided into eight aspects: North, East, South, West, Northeast, Southeast, Northwest, and Southwest. The classification standards are shown in Table 1. "Time (a, BP)" means the age of the cultural site, the unit is a (year), BP is Before Present, and the specific period is shown in Table 1. Since the cultural type or historical dynasty age is adopted, and most sites lack absolute geological dating, the site's age is an approximate value. "Section curvature" represents the section curvature of the plane where the cultural site is located, with 4 significant digits reserved.

technical Validation
The information on cultural sites recorded in this study comes from the written materials released by the State Administration of Cultural Heritage of China, the Xinjiang Institute of Cultural Relics and Archaeology and the Chinese Archaeological Society. The written materials are authoritative because these publishers belong to official Chinese institutions and top academic groups. At the same time, we combined the excavation reports of some scholars, news information and field verification to confirm the location information of the site. To extract geographic information, we use the highly recognizable geographic background data released by the current official scientific research institutions. There is no problem with the scientificity and standardization of the background data. However, different data were produced in different years, by different authors and institutions, lack of uniform standards for data recording, and different observation techniques. When we collect data information, we are inevitably affected by the potential impact of these problems, which limits the improvement of site positioning accuracy and data accuracy. In addition, due to the lack of detailed records of on-site excavations and interviews with many excavators, the cultural information of the site recorded by this data is limited, and the spatial-temporal accuracy of site information needs to be further improved. Because of the above two deficiencies, we are still collecting data and going to the field to verify as much as possible, to continuously update the data and improve its quality.The spatial position error should be between 0-1 km. However, from the perspective of data application, this data is more used for macro analysis on a large spatial scale than the discussion of microsite information requiring high-precision location information, and there is not such a large number of sites in the study of microsites. From this point of view, the spatial error of sites is understandable. For the analysis of micro-sites, this data can also provide high-precision spatial locations or a general range, which is also of great guiding significance for researchers, especially scholars who are not familiar with the distribution of Xinjiang cultural sites and their first field investigations.

Usage Notes
From the current point of view, the application of this data is in the following aspects: (1) Research on the relationship between man and land in prehistoric times. Early humans existed and lived in Xinjiang. In the process of repeatedly adapting to climate change in Xinjiang, early human beings in Xinjiang transformed the landscape, built and retained a large number of sites. The collation and collection of these site information in this data can provide a data basis for the site selection characteristics of human settlements, the temporal and spatial evolution process and model research of prehistoric human adaptation and transformation of living environment in this area. It can also deduce and interpret the evolution of prehistoric human relations. (2) The proliferation process and spatial distribution characteristics of prehistoric humans. The origin, expansion process, and spatial distribution of human beings focus on environmental archaeology, anthropology, history, and geography. Firstly, this data records the age of each site, reflecting the sequence of site formation and the spatial process of human territory expansion. The spatial distribution characteristics of the site are the potential indicators of the spatial scope, density and scale of human activities. Therefore, the information mining of site data can reference the diffusion process and activity characteristics of prehistoric humans. Secondly, Xinjiang is a channel connecting China and Central Asia. Therefore, the in-depth combing of the cultural types of sites in this region can provide a reference for further understanding the East-West exchanges and the evolution of civilization in Eurasia. (3) Utilization and protection of the ruins. Prehistoric human sites are not only essential material materials for studying prehistoric humans. For modern society, they are also cultural heritage and important resources. Many large-scale sites are often built as museums or developed as tourism resources after excavation. Therefore, the site database can evaluate and develop tourism resources in the area. The site prediction based on the distribution characteristics of the existing sites can reveal the high probability areas of the site distribution, provide guidance for the critical areas of the site excavation, and can be applied to social and economic construction planning. In this way, the conflict between the excavation and protection of the ruins and the land for economic construction can be alleviated.

Code availability
No code was used in the creation of this data. Titles should avoid the use of acronyms and abbreviations where possible. Colons and parentheses are not permitted.